12 research outputs found
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Off-policy temporal difference (TD) methods are a powerful class of
reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD
algorithms are not commonly used in combination with feature normalization
techniques, despite positive effects of normalization in other domains. We show
that naive application of existing normalization techniques is indeed not
effective, but that well-designed normalization improves optimization stability
and removes the necessity of target networks. In particular, we introduce a
normalization based on a mixture of on- and off-policy transitions, which we
call cross-normalization. It can be regarded as an extension of batch
normalization that re-centers data for two different distributions, as present
in off-policy learning. Applied to DDPG and TD3, cross-normalization improves
over the state of the art across a range of MuJoCo benchmark tasks
Far Away in the Deep Space: Dense Nearest-Neighbor-Based Out-of-Distribution Detection
The key to out-of-distribution detection is density estimation of the
in-distribution data or of its feature representations. This is particularly
challenging for dense anomaly detection in domains where the in-distribution
data has a complex underlying structure. Nearest-Neighbors approaches have been
shown to work well in object-centric data domains, such as industrial
inspection and image classification. In this paper, we show that
nearest-neighbor approaches also yield state-of-the-art results on dense
novelty detection in complex driving scenes when working with an appropriate
feature representation. In particular, we find that transformer-based
architectures produce representations that yield much better similarity metrics
for the task. We identify the multi-head structure of these models as one of
the reasons, and demonstrate a way to transfer some of the improvements to
CNNs. Ultimately, the approach is simple and non-invasive, i.e., it does not
affect the primary segmentation performance, refrains from training on examples
of anomalies, and achieves state-of-the-art results on RoadAnomaly,
StreetHazards, and SegmentMeIfYouCan-Anomaly.Comment: Workshop on Uncertainty Quantification for Computer Vision, ICCV
2023. Code at: https://github.com/silviogalesso/dense-ood-knn
Latent Diffusion Counterfactual Explanations
Counterfactual explanations have emerged as a promising method for
elucidating the behavior of opaque black-box models. Recently, several works
leveraged pixel-space diffusion models for counterfactual generation. To handle
noisy, adversarial gradients during counterfactual generation -- causing
unrealistic artifacts or mere adversarial perturbations -- they required either
auxiliary adversarially robust models or computationally intensive guidance
schemes. However, such requirements limit their applicability, e.g., in
scenarios with restricted access to the model's training data. To address these
limitations, we introduce Latent Diffusion Counterfactual Explanations (LDCE).
LDCE harnesses the capabilities of recent class- or text-conditional foundation
latent diffusion models to expedite counterfactual generation and focus on the
important, semantic parts of the data. Furthermore, we propose a novel
consensus guidance mechanism to filter out noisy, adversarial gradients that
are misaligned with the diffusion model's implicit classifier. We demonstrate
the versatility of LDCE across a wide spectrum of models trained on diverse
datasets with different learning paradigms. Finally, we showcase how LDCE can
provide insights into model errors, enhancing our understanding of black-box
model behavior
Compositional Servoing by Recombining Demonstrations
Learning-based manipulation policies from image inputs often show weak task
transfer capabilities. In contrast, visual servoing methods allow efficient
task transfer in high-precision scenarios while requiring only a few
demonstrations. In this work, we present a framework that formulates the visual
servoing task as graph traversal. Our method not only extends the robustness of
visual servoing, but also enables multitask capability based on a few
task-specific demonstrations. We construct demonstration graphs by splitting
existing demonstrations and recombining them. In order to traverse the
demonstration graph in the inference case, we utilize a similarity function
that helps select the best demonstration for a specific task. This enables us
to compute the shortest path through the graph. Ultimately, we show that
recombining demonstrations leads to higher task-respective success. We present
extensive simulation and real-world experimental results that demonstrate the
efficacy of our approach.Comment: http://compservo.cs.uni-freiburg.d